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(57) Abstract: An interactive image searching and sort- 
ing method, system and computer program is provided that 
makes use of the three-dimensional relations among a plu- 
rality of images. Calibrated cameras are used to capture 
images from a single scene. The user interacts with the sys- 
tem by means of selecting a point of interest on an image. 
The system generates a probabilistic model of the user's se- 
lection and determines the location of the. object of inter- 
est with the use of spatial likelihood and reliability func- 
tions that model the likelihood of the object location and er- 
ror in sensors. The system iteratively, based on user input, 
processes the images and based on the probabilistic model 
ranks the images to provide the 'besf view of the region of 
interest 
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INTERACTIVE THREE-DIMENSIONAL SCENE-SEARCHING, IMAGE 
RETRIEVAL AND OBJECT LOCALIZATION 

Background of the Invention 

5 With the explosion of images on the World Wide Web and creation of online 

databases for visual media, the ability to search through large numbers of images has 
become evermore important As a result, a great deal of research has been done in die field 
of image searching. Most existing algorithms for image searching and retrieval use some 
form of content-based algorithm in order to accomplish the goal, as particularized in 

10 References [1H4] below. Many techniques use a color-histogram to analyze and compare 
the color contents of different images and sort them accordingly, as set out in Reference [5] 
below. Other methods apply computer vision and image processing algorithms such as 
segmentation as per Reference [6] below on the different images, and attempt to extract the 
shape and contents of the components of the images. In the case of searching for images on 

IS the web, since there is not a great deal of time to complete the search, most searching 
algorithms, search the content of the pages that are linked to the image for keywords. For 
example, the authors in Reference [7] below develop a method for weighing the available 
side-information for this purpose. There are still other methods, for example, in Reference 
[8] below, the authors suggest an interactive system with a neural network to accomplish the 

20 task. It should also be mentioned that more recent developments such as those outlined in 
References [2], [7H9] below all use some sort of interactive or relevance-based method to 
increase the accuracy and robustness of the system. 

The algorithms disclosed in the prior art references listed above generally do not 
assume three dimensional dependence among the images, such that when comparing images 
25 there is no difference in the processing of die data whether the images are taken from the 
same scene, or from different scenes. If it is known that the images are all from the same 
general location, then other features can be used to compare the images. More specifically, if 
the images are taken using cameras with known calibration parameters, significant geometric 
and spatial data can be obtained that can be used for comparisons and sorting. 

30 What is needed is a method, system and computer program that enables images to 

taken from the same scene to be searched and sorted in an efficient manner. 
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Summary of the Invention 

One aspect of the present invention consists of a method, system and computer 
program for ranking images taken from a single scene based on die visual coverage that the 
images provide for a particular object or region in space. In a particular aspept of the 

5 invention, first it is determined whether the object or region is present or not in thei images. 
Therefore, die ranking in a first stage distinguishes among images that include and those that 
do not include the desired object or region. Second, from the remaining images those 
images are separated that have a 'better 9 view of the particular object or region. In a 
particular aspect of the present invention, means is provided for selection of images that 

0 have this "better* view. 

In another aspect of the present invention, a method of searching scenes in images, 
retrieving images, and/or localizing objects in images, characterized by: obtaining a plurality 
of images from one or more cameras, the plurality of images including at least two images 
including a view of a single area or object shown in a scene; selecting a particular area of 

5 interest or object of interest in the scene; and iteratively establishing a sub-set of the plurality 
of images that are probably of interest for viewing the particular area of interest or object of 
interest by: (i) determining a probability distribution of the plurality of images based on 
location data and data regarding the geometry of an environment of the scene established for 
the area of interest or object of interest; (ii) refining the probability distribution by obtaining 

0 user input regarding: (a) one or more of the sub-set of plurality images that the user 
considers to be most relevant from the current sub-set of the plurality of images; and (b) 
selection of the particular area of interest or object of interest in the one or more most 
relevant images; and (iii) updating the sub-set of the plurality of images based on the user 
input of (ii). 

5 Brief Description of the Drawings 

A detailed description of several embodiments of the present invention is provided herein 
below by way of example only and with reference to the following drawings, in which: 

Figure 1 is a system resource diagram illustrating the principal resources of the system of the 
D present invention, in accordance with one particular embodiment thereof. 
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Figure 2 is a program resource diagram illustrating the principal resources of the computer 
program of the present invention, in accordance with one particular embodiment thereof. 

Figure 3 is a process flowchart illustrating the steps of the method of the present invention, 
5 in accordance with one particular embodiment thereof. 

Figure 4a is an illustration of a representative interface for the computer program of the 
present invention, in a particular aspect thereof. 

10 Figure 4b is an illustration of another representative interface for the computer program of 
the present invention, in a particular aspect thereof 

Figure Sa is a diagram that illustrates the method for identifying points within the Field of 
VieworFOV. 

15 

Figure 5b is a diagram that illustrates the method for dfctermining whether a point is within 
theFOV. 

Figure 6 illustrates a multi-camera system with overlapping FOV's. 

20 

Figure 7 illustrates a series of images taken from a scene that include a plurality of images, 
the images including a plurality of images that show a single image in a plurality of views 
thereof. The images are in a random order. 

25 Figure 8 illustrates the selection of the particular object and the sorting of the images in 
order of relevance to providing views of the particular object, in accordance with the present 
invention. 

Figure 9 illustrates fifteen highest-ranked images after selecting from the recycle bin twice, 
30 in accordance with the present invention. 
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Figure 10 shows the highest ranked images after selecting from the recycle bin three times, 
in accordance with the present invention. 

In the drawings, preferred embodiments of the invention are illustrated by way of example. 
5 It is to be expressly understood that the description and drawings are only for the pyrpose of 
illustration and as an aid to understanding, and are not intended as a definition of the limits 
of the invention. 

Detailed Description of the Invention 

10 . Fig. 1 illustrates the system of the present invention, in one particular embodiment 

thereof. Typically the system includes, or is linked to, a camera network or camera array , 
(10). As explained below, the present invention enables scene searching, image retrieval and 
object localization in relation to a plurality of images, in which the plurality of images 
include at least two images that include views of a particular object or scene taken by a 

15 camera. While the present invention can be practiced in relation to such images taken by a 
single camera, typically 1he invention is practiced in relation to a plurality of cameras linked 
in the camera network or camera array (1 0) depicted in Fig. 1 . 

Depending on the type of cameras used in the camera array (10), if the camera array 
generates digital images, then the camera array (10) is typically linked to an IP network (12), 
20 and the digital images are stored to the digital archive (12). If the camera array (1 0) 

generates analog images, then the camera array (10) is linked to an analog network (14) and 
the image recording is converted (16) by operation of a suitable analog to digital converter, 
and then the resulting digital images are stored to the digital archive (12). 

The computer system of the present invention is generally illustrated as a 
25 Computation Means (20) in Fig. 1, which in a typical implementation of the present 
invention consists of the computer program (22) (best understood by reference to Fig. 2) of 
the invention loaded on a computerized device (not shown) linked to the camera array (10). 

In a particular embodiment of the present invention, the principal resources of the 
computer program of the present invention are illustrated in Fig. 2. For the sake of clarity, 
30 Fig. 2 illustrates the present invention in representative blocks, namely the interface block 
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(24), the storage block (26) and the computation block (28) for the sake of understanding the 
principal funbt&ns of the computer program. The organization of the computer program 
(22) injo blocks (24), (26), (28) should not be understood as referring to a particular 
computer program structure and therefore limiting in any way the present invention to a 
5 particular computer program, or particular structure thereof. The functions of the computer 
program of the present invention can be provided in more or less, or different blocks than as 
illustrated in Fig. 2. 

Th§ functions of the computer program (22) qre explained in greater detail below, 
including $ reference to Fig. 2. 

10 Outline of Method Steps 

The method of the present invention is best understood by reference includes 
following steps (and Fig. 3 illustrates a particular embodiment of this mpthod, as explained 
in greater detail below): 

1) Obtaining Mages: a plurality of imagejs are'obtained, the plural ityf of images including at 
15 least two images including view of a single area or a scene or object shown in the scene. 

Genefally, the present invention assumes a relatively large number of images, which require 
searching and sorting to derive one or more images comprising a subset of the universe of 
images obtained, as particularized below. These images are obtained from the digital 
archive (12), as particularized above. For the sake of clarity, the images are obtained from 
20 the camera array (10X typically consisting of a relatively large array of video-cameras or 
they can be still images of a particular environment taken with a single or multiple cameras. 
The images are generally assumed to be available to a user in random order. 

2) Calibration: In order to process the available images using the three dimensional 
information from the environment, they are preferably captured using calibrated cameras. 

25 The cameras of the camera array (10) are generally calibrated prior to the capture of the 
images referred to in 1) above, however, as particularized below in 5) the present invention 
involves, where necessary, further calibration of the cameras of the camera array (10) in 
response to the search/retrieval/localization functions described below. Therefore in a 
particular aspect of the present invention, the camera array (10) is responsive to a series of 

30 calibration commands from the computer program (22) in conjunction with the 
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search/Tetrieval/localization functions of the present invention. As an alternative to such 
calibration, images can be captured and then cameras calibrated using landmarks in a manner 
that is known. 

3) Object Selection: It is assumed that the user is interested in a particular object or Vegion in 
5 space depicted in at least two of the images obtained. The computer program (22) is operable 

to take as input the selection of a point on an image within the presented set of images. The 
selection occurs by operation interface illustrated in Fig. 4a by which a user selects an image 
point, typically using a cursor. 

4) Probabilistic Localization: The image point that the user selects corresponds to a line in 
10 three dimensions. Moreover, it is assumed that the user is not interested in a single point; in ' 

space, but rather a region. Therefore it is assumed that the user is interested in a region of the 
image, with a certain probability distribution as further explained below. Using this 
distribution and the available camera views, the images can be sorted and the location of the 
region of interest can be narrowed down. 

15 5) Refinement: Hie last three steps are repeated until a reasonable degree of localization is 
achieved as particularized below. 

The steps 1) to 5) above are explained in greater details below. 

Camera Geometry and Calibration 

Camera calibration is a subject that has been extensively explored in the literature, 
20 for example Reference [10]-p2] below. The below provides a camera calibration in the 
context of the present invention, and in particular calibration as described in step 2) above. 

Homogeneous representation 

Any line on a plane can be represented by an algebraic equation of the form 

25 The different choices of a > (J and y determine the direction of the different lines. 

Thus a line on a plane can be represented by a vector of the form (a 9 fi,y) T . Any multiple 
of this vector represents the same line, and the set of all of these multiples form an 



WO 2006/005187 



PCT/CA2005/001093 



7 

equivalence class called a homogenous vector. The set of equivalence classes in IR 3 forms 
the projective 'apace IP 2 . 

Equation (1) also represents all the points x = (x, y) that lie on the line t={cc , f) , y ) T . 
This equation can be written as a dot-product fcy,l) * / = 0, which is unique up to a scale 

5 factor. As a result, any point. in IR 2 can be represented as x{x 1 y t \\ with (x,y) being the 
actual co-ordinates in IR 2 and k a constant scalar. This is the homogeneous representation 
of a point on a plane, and is an element of the projective space IP 2 . Similarly a point in 3D- 
space is demoted by a 4-element vector of the jbrm Xf=€(X,YAl) T » where ^ is a constant 
scalar. Note that in the disclosure belo\y, small bold-faced letters such as x are used to 

0 denote 2D image points, and capital bold-faced letters such as X used to. denote 3D world 
points. 

Calibration 

The 1 cameras that are part of camera array (10) are generally assumed to be general 
projective cameras (for example as particularized in Reference [10] below) and are generally 
5 calibrated using linear methods in a manner that is lpipwn. 

When an iniage is captured by a camera of the type particularized, there is a 
transformation that takes the three dimensional coordinates of points in the real world and 
maps them onto a two dimensional image plane. This transformation is described by the 
following equation; . 

) x = FX 

where X is the coordinates of a point in the real world, and x is the corresponding 
coordinates on the image plane. P is the Projection Matrix that transforms the 3D points to 
2D. As mentioned before, the x and X are represented using homogeneous coordinates, and 
so P is a 3 x 4 matrix. Calibrating a camera as described in the present invention consists 
> generally of calculating the elements of this matrix. The matrix can be decomposed to 
extract intrinsic and extrinsic parameters of the camera such as the focal length, skew, 
rotation, translation, etc. In the next section, the method for computing P as it is shown in 
Reference [1 0] is discussed. 
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SobringforP 

In order to solve for the projection matrix, we represent P in the following manner 
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where P* 1 is me A* row of the matrix P. 

Now, it is desired to solve for P such that X/, where i indicates the I th known pqint 
used for calibration and X/ = (x b y h l) T is the image coordinates of the real world point X/. 
Since homogeneous coordinates are being used, everything is unique up to a scale factor. 
Hence, X/ and PX f are only in the same direction, and not necessarily equal. This leads to the 
following cross-product equality: 



X/X PX/ = 0 



Then using (3) 



) 



Now, (P^X, = X, T P* for fc=l, ...3, which results in the following: 
0 T 



1 5 Note that w y are the scale factors for x/. Now let p be the vector containing the entries 

of the matrix P and A be the leftmost matrix in (4c). Consequently, it is necessary to 
effectively solve for the equation Ap=0. Note that only two of the rows of the matrix A are 
linearly independent, thus to find a solution, at least six points are needed, since there are 1 1 
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degrees of fitsedom in the matrix Pand each point of correspondence results in two linearly 
■ *\ " • 

independent equations. It must however be noted that it is only in the absence of noise, that 
die point correspondences are 'perfect* and using six points to find a solution results in the 
unique and correct solution. In practice, however, the measurements are not perfect and as a' 

- 5 . result generally many point .correspondences are required to solve the system. These 
correspondences (more than six) result in anover-<leternrined system that must be solved 
while minimizing some error-measure. A standard approach is to minimize | Ap jj under the 
constraint that 0 P D =1 norm of P * of no consequence since Pis only defined up to a 
constant scale factor) [10]. This problem the same as finding the minimum of S^ll/JjplJ- 

10 The solution is the unit eigenvector of A T A with theleast eigenvalue. This is the same as the 
unit singular vector corresponding to the smallest singular value of A. 

Spatial Likelihood Functions (SLFs) 

One of ' the aspects of the present invention, as particularized below, involves 
localization of, the particular object or scene. This localization depends, in a particular 
1 5 embodiment bf the present invention,' on localization as described under this heading. . 

In most practical settings, it is extremely difficult, if not impossible to localize an 
event or phenomenon deterministically and perfectly accurately using sensors. As a result, a 
probabilistic approach is preferred as compared to a deterministic one. Under these 
conditions, preferably the following are obtained: the probability mass function of X, the 
20 object location, or some function that is proportional to P(X). It should be understood, 
however, that for the purposes of localization, only a monotonically decreasing function as 
opposed to an actual distribution. 

rx.-^(P(x=xj©)) 

/ 

where T(X) is the SLF at spatial location X, X* is the true location of interest, © represents 
25 all available data, and is a monotonically decreasing function of/. 
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SLF Generation 

It is assumed in this part of the disclosure that all of the cameras forming part of 
camera array (10) have been perfectly calibrated. Thus their corresponding calibration 
matrices F h corresponding to the'i* camera are available. 

5 The point Xu,,* corresponds to a line from the centre of the camera and through the 

selected point on the image plane, intersecting the selected object/point in 3D space. It is 
assumed that the objective is to select a region in space or an object of finite size as opposed 
to a single point in space. There is also a certain amount of error associated with the point 
selected by the user. The selection is done in a probabilistic maimer. By looking at the image 

.10 plane and taking the selected point as die mean of a Gaussian distribution with a user- - 
defined variance, a region of space around the mean is selected, with die size determined by 
the variance. Thus every point that lies on the image plane, regardless of whether it is in the 
FOV of die camera or not, will have a likelihood value associated with it This is the 
likelihood that the point is the point of interest Now, since eveiy point in space as seen by 

1 5 the camera lies somewhere on the image plane, it will have an associated probability value. 
This way the likelihood of every line through the camera centre has been determined. Note 
that the variance of the Gaussian distribution determines how large of a volume is desired for 
the final localization, and this should be varied for different applications. 

. To generate the SLF, the volume of space to be considered is first determined. This 
20 can be taken to be a cube of length U with Ci at the centre of one face •looking* inside die 
cube. Now let # be the set of all points in this volume of space. Then, likelihood values can 
be assigned to one such cube corresponding to the region of interest as observed by the first 
camera by looking at the projection of each point in the defined space on the image plane. 
Note that for successive SLFs, the same x *** was defined with the first camera located at 
25 the centre of one face is considered; this however does not have to be the case and as long as 
the volume is a fixed one, the system will work properly. 



Thus the projection of the SLF.on the image plane will have the following form: 



WO 2006/005187 



PCT/CA2005/001093 



11 

Where x^j = (x^^, ,l) r , x= F}X = (jc^l) r and X is the 3D coordinates of a point in 

the volume of space to be analyzed. Now cr x and & y determine the size of the volume that 
the system is focusing on and may vary depending on the required search resolution. 
5 Without lo^s of generality, H can be assumed that these Wo values are equal. 

It must be noted that this SLF never assumes zero as a value. The Gaussian decays to 
zero at infinity. This also allows for easier computation of the true 3D §LF since in general, 
there will be points in x that are not seen by the camera under consideration. Projecting 

these points using Ft results in pixel coordinates outside die resolution of die camera, and 

10 thus a iow probability that the point is of interest to the user. 

Spatial Reliability Functions (SRI?s) 

• Another aspect of the present inventioh involves assessing whether a particular point 
lies in the FOV of a particular camera and how much reliable access the particular camera 
has to the particular point. This is determined using assessment of the Spatial Reliability 
1 5 Functions of particular cameras forming part of the camera array (1 0), as explained below. 

Whenever multiple sensors are used in an environment to gather information 
regarding an event or a phenomenon at some location, the data obtained using each sensor 
has a certain level of reliability associated with it This level of reliability may be due to the 
proximity of the sensors to the phenomenon of interest, the intrinsic properties of the 
20 sensors, or other factors that may be caused by the structure of the sensor network and the 
environmental setting. For example, the data obtained from an acoustic sensor (e.g. 
microphone) closer to a sound-source is more reliable (has higher signal to noise ratio) than 
that obtained from a sensor which is far away. Therefore, for every sensor in the system, a 
probability value can be assigned to every point in the space that is of interest This is a 
spatial reliability function, and for each sensor, it represents the likelihood of the reliability 
of any information obtained regarding a specific spatial coordinate. If it has a value of unity 



WO 2006/005187 



PCT/CA2005/001093 



12 

at a specific location, then this means that the information obtained regarding that location is 
perfectly accurate and is not corrupted by any type of nofce; and if it has a value of zero, then 
the information obtained is completely inaccurate. 

SRF Generation 

5 To generate an SRF for the t* camera, we first find the set of all points fi that are 

both in # and in the FOV of tiie/* camera F/. 

<fi={x = (peps) : X e .FOV of the i* cameraj- , 

{ fi,= X=(w):Xe {*n«F,}}' , 

i 

Then reliability values are assigned to each point in fii according to a monotonic 
10 radial .decay. This is the SRF for the /* camera pflQ, where the maximum is at the camera 
center (sligjrtty in front of the camera in fact). 

Identifying points within the FOV 

There are a number of methods that could be u§ed to determine <F. One method is to 
find the lines that originate from the centre of the camera and go through the four corners of 
15 the image plane and then use the planes spanned by adjacent pairs of these lines as 
boundaries of the FOV. This is shown in Figure 5(a). One first has to find the equation of the 
lines. Two points on each line are known, and those yield an equation. Take lj> j € {1 ,2,3,4} 
to be a line going through one of the corners, and to be the corresponding corner point on 
the image plane. The equation of lj then becomes [10]: 

20 3fr)FV* + »C 

here C is the coordinates of the camera, and X, corresponds to any point that lies oh the line 
f* is the pseudo-inverse of the projection matrix F and is defined as the following: 

thus FF t = /. This leads to the following projection of points on the line 
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X c y H<> 

since the coordinates of the camera when projected using F maps to (0,0,0) r . Now, any 

5 point on this line maps onto the same point on the image plane. To determine whether a 
point is in the FOV or not, it is determined if the point lies in front or behind the camera. 
Any point X = (X>Y&T) r in space lies in front of the camera if it has a positive depth [1.0]' 
defined by: 

0 where F = [M|p4]is the projection matrix for the camera, and P= (XJTJZJ) 7 . And m 37 is the 

third row of M and thus a vector in the positive axial direction. To find whether a point is 
actually in the field of view of the camera, the planes shown in Figure 5(b) are established. 
Then angle between the; line connecting the point in question to the camera centre and each 
of the planes can be determined. Knowing the planes Pi, P 2 , Pa and P 4 each visible point can 
5 only be in a certain range of angles from the planes, and its can be determined whether the 
point is in the FOV of the camera. 

Although the technique described above works well, it is not die easiest method for 
practical purposes. The easiest way to determine whether a point lies in the FOV would be to 
project it onto the image plane using F and see if it is within the image resolution of the 

) camera For example, for a camera with resolution 640 * 480, if the projected point is in [0, 
639] in the horizontal direction and [0, 479] in the vertical direction, then the point is in the 
FOV. Note that the absolute values in the range depend on the pixel which is taken to be the 
origin. Therefore, in order to find p, we can take every point in <F and see whether it lies in 
the appropriate range of the image plane coordinates, and if so then the point is also in 0. 

> Also note that we still have to use the depth function to determine whether the point is. in 
front or behind the camera. 



WO 2006/005187 



PCT/CA2005/001093 



14- 

Viable SRFs for cameras 

It can be assumed that the ability of a camera in 'observing' objects decreases with 
distance. Therefore the SRF must decay monotonically with distance from the camera center. 
The rate of the decay depends on the resolution of the camera and more generally on the 
5 'quality' of the camera and may be determined experimentally. A possible SRF definition is 
given below: 

nr\ - S max(0 r £ ■ b^^°W) it X € B 

where X e IF, C is the coordinates of the camera, and y is determined experimentally. p c is 
the SRF. k is a constant that determines how much more the closer regions are emphasized 
10 as compared to the regions farther away and y is the decAy rate, determined experimentally. 
This exponential decay allows the system to emphasize regions of space tnat are closer to the 
camera and punish those regions farther away. As a result, cameras that are closer to the 
region of interest will be considered more reliable.' 

In the experiments conducted here, an exponehtial SRF has been used, with k = 30 
15 and y = 30cm" 1 . 

Further Description of the Method 

In accordance with a particular aspect of the present invention, the camera amy (10) 
includes at least two cameras having overlapping FOVs, as shown in Figure 6. So, any given 
object, or region in space, lies within the common FOV of more than one camera in this 

20 particular arrangement Therefore, images of the object or region of interest are available 
from different angles and different positions, i.e. images are taken from different distances 
from the region. Using this, we can ignore the irrelevant images, i.e. those obtained using 
cameras whose FOVs do not contain the desired spatial coordinates. Furthermore, we can 
rank the remaining images, based on the distance of their corresponding cameras from the 

25 region of interest, and whether they completely contain the desired spatial coordinates or not. 
This is accomplished by assigning an SRF to each camera and generating an SLF, as 
described above. The computer program of the present invention including computer 
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instructions that when provided to a computerized device are operable to provide this 
function. 

A representative algorithm is described which when provided as computer 
instructions forming part of the computer program of the present invention, proVides the 
5 function particularized. It should be understood that other algorithms providing this function 
are possible. 

The algorithm initially generates the SRFs for all the cameras using Equation 11. 
After the user selects the first view and the region of space within that view, the 
corresponding SLF is generated and is combined with the SRFs just by pointwise 

.10 multiplication and normalization. To do this, a normal distribution with mean at the selected ' 
point is assigned to pixels in the image using Equation 6 representing the probability of the 
pixels being the point of interest on the image. Then every point in the environment is 
projected onto the image plane of the active camera using Equation 2. The. likelihood of any 
point in the environment being of interest is set to be equal to the likelihood value associated 

15 with its projection onto the 2D image plane. It must be noted that this is an interactive 
localization based on feedback from the user, and the SLFs do not correspond to the same 
location after each iteration; they instead correspond to lines and regions in space that should 
ideally have an intersection at the point of interest 

To navigate through the available camera viewpoints, there is a need to assign 
20 degrees of validity to each camera forming part of the camera array (10), for the particular 
selected scene. Immediately after the first selection, one SLF and all the SRFs for the 
cameras are available. According to Reference [23] cited below, Er(x)&>/] can be used 
instead of E/» (±)\p,] as a measure of the reliability. Therefore, to rate each remaining camera 
in the camera array (10), one can multiply the SLF with each SRF and sum all the likelihood 
25 values. This gives a measure of the level of access that each camera has of the selected 
region. In other words, it shows how well each camera can observe the selected region. The 
user now moves to another view and selects another point Again an SLF is generated, and 
this time it is multiplied with the previous SLF and normalized; this further narrows down 
the region of interest, and again the remaining camera views are ranked based on this 
30 combined SLF and their SRFs. The process is continued until the desired view is found. Alg. 
1 shows step by step how 1SL works. 
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The spatial coordinates of the location of the object of interest can be estimated by 

taking the ex^ed value of the SLF when the SRF is factored in. This means that to find the 

location of the object or region of interest in using a particular camera, the SLF is multiplied 

with the SRF and then the maximum coordinates are projected onto the image plane. The 

accuracy of this estimate increases as the number of selections by the user is increased. This 

is because after each selection, the SLF gets.piore and more concentrated jn a very small 

volume, which is common among all the individual SLFs corresponding to each selection 

from a particular viewpoint 
■ 



Algorithm 1 



1: Generate SRFs for all cameras G+: 1 , 

f max(0, K - «rHl x - c '") if X € B 
0 . if,X/€B 

2z Allow user to select an initial viewpoint Vfi j = 0. 

3: Allow user Id select an initial point x^ d from Vj. 

4: Based ,on the environment select fee volume of space x 

to analyze. » , 

5: Compute p{X*j = X) T the SLF.forthe initial point ifsing 

the projection of the SLF: / ' ■ » ^ . ' 

.P(x = Xu,i) = ^e X p|-^[(x^^) 2 " 

6c Let r(X) = P(xuj ■ -X) be the most recently obtained 
SLF. 

7: Compute the expectation of the SRF according to T(X): 

Vlft - Erpc) [A] - /// r (X) - * m £ T (X) ■ * 
Hi *»* 
8: Rank cameras in order of highest >V< to lowest and display 
views. 

9: 

10: Allow user to select the next viewpoint V, . 

II: Allow user to select the next point x^j from Vj. 

12: Compute the SLF for the latest selection I*(X). 

13: r(X)«-r(x)*r'(x). 

14: Go to step 7 and repeat until the desired view is selected. 
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The method in greater detail is best understood by reference to Figure 3. The digital 
archive (12) (shown in Figure 1) contains a plurality of images taken from a scene filled with 
a number of objects. A display (not shown) linked to the computerized device (not shown) 
running the computer program (22) displays the representative user interface (23) of the 
5 computer program (22), shown in Fig. 4a The user iriterface (23) enables the use; to view 
multiple camera angles of a scene in a main viewing area (25),' by means of a series of 
"FUNCTION KEYS" provided by the user interface for example "FORWARD?; 
"PAUSE/PLAY", "BACK* 5 , "LIVE", "ZOOM" and so on. 

The user interface (23) is Operable to permit the user to select an object shown in the 
10 main view (25). Thereafter, the system of the present invention: (1) estimates the 3D 
location of the object selected (30), (2) determines the scene(s) of interest (32) (i.e. the atea- 
of the object), (34) estimates the 2D location of the object (36), (4) annotates the 2d location 
(e.g. red outline) (38), (5) orders the scenes of interest from the various images according to 
the most probable to relate to the selected object (40), (6) displays the ordered scenes of 
15 interest If optimal localization is achieved (i.e. selected object is visible in desired 
resolution) (42) the process has been completed, and the optimal image of the object is. 
viewed, associated video is accessed and played back, reversed, viewed in ^LIVF* format 
(44). If the user determines that optimal localization has not bepn achieved, then the user 
can access a number of operations that enable further selection of the desired object or area, 
20 and thereby the process begins again until on an iterative basis the desired localization is 
achieved. 

For example, user interface (23) displays to the user 15 of the images at any given 
time, and user wants to find the best 1 5 images of a particular object, for example a recycling 
bin., the initial 15 images that are available at random are shown in Figure 7. The recycling 

25 bin is selected from the 12' A image in this set (the top-left image is taken as the first and the 
bottom right image as the 15th). The algorithm then assigns different rankings to each 
image, based on the selection of the recycling bin, and reorders the images, displaying the 
top-ranked as in Figure 8. Taking the expectation of the SLF over the whole space, the 
location of the desired object is estimated and boxed with a red color. It can be seen that in 

30 this iteration, four of the images include the object of interest, and in two, the location has 
been properly estimated. 
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This selection process by the user is repeated another two times (to provide the 
results of Fi^urk'9), and the final ordering of the images is shown in Figure 10. All but one 
of the images contain the recycling bin. Furthermore, the location of the bin has been 
determined perfettly in every image. The 13'* image is an anomaly in this experiment This 
5 image is ranked high, due to a poor calibration at the time of taking the photograph, resulting 
in incorrectly inferred spatial locations. 

It difficult to analyze quantitatively the performance of the system of the present 

invention, £n- experiment was performed to quantify the quality of the image-rankings as the 

number of Object selections increases. The same data set as the previous experiment was 

10 used. A set of 10 individuals were asked to mamially give a score between 0 and 2 to the 

objects in the images. So the individual would go through every image, and look to see if the 

objects in question are visible in theimage. If the object completely visible, a score of 2 is 

given *to it; if it is partially visible, a score of 1; and if it is not visible ei all, a score of 0 is 

assigned td the'particular object for the particular image. The system, is then used to select 

i i 

15 the particular pbject and the scores, (averaged pver all, the individual^ and objects) are then 
plotted versus the image-ranks given by the algorithm. It was found that the highest scores 
given, by the individuals participating, in this'test correiponded to the highest-ranked images 
as determined in accordance with this invention. 

The present invention makes it possible to find specific objects and regions of 
20 interest in all of the images by looking at the spatial expectation of the spatial likelihood 
function. This becomes especially useful in circumstances where a very large number of 
cameras are available and the man power to sort and search within the images is limited. 

The described invention has a multitude of applications. In security and surveillance, 
it can be used to reduce the number- of human monitors, and increase the speed and 
25 efficiency of monitoring large environments, like an airport or a casino. For these 
applications, the system can further be combined with live-streamed video and tracking 
systems. In advertising, it can be used to create very large-scale databases of images of the 
item of interest, and make it available to potential clients digitally. As another example, a 
building company can completely photograph a newly designed building in this fashion. 
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CLAIMS 

1. A method of searching scenes in images, retrieving images, arid/or localizing objects 
in images* characterized by: 

(a) obtaining a plurality of images from one or more cameras, the plurality of 
images including at least two images including a view of a single area or 
object shown in a scene; 

(b) \% 'selecting a particular area of interest or object of interest in the scene; and 
\ 

(c) iteratively establishing a sub-set of the plurality of images that are probably 
of interest for viewing the particular area of interest or object of interest by: 

(i) determining a probability distributibn of the plurality of images based 
on location data and data regarding the geometry of an environment of 
the scene established for the area of .interest or object of interest; 

(ii) refining the pfababifity disbityjtion by obtaining user input regarding: 

(A) one or* moire of the sub-set of plurality, images that the user 
considers to be most relevant from the current sub-set of the 
plurality of images; and 

(B) selection of the particular area of interest or object of interest 
in the one or more most relevant images; and 

(iii) updating the sub-set of the plurality of images based on the user input 
of(ii). 

2. The method of claim 1, characterized by the further step of calibrating the one or 
more cameras to enable a transformation whereby three-dimensional coordinates of 
points within the field of view of the one or more cameras are mapped to a two- 
dimensional plane. 
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3. The method of claim 1, characterized by the further step of generating a spatial 
reliability function for each of the one or more cameras, and assigning the 
corresponding spatial reliability function to each of the one or more cameras. 

4. The method of claim 3, characterized by the .further step of generating k spatial 
likelihood function for the selected particular area of interest or object of interest, and 
combining each of the applicable spatial reliability junctions with the spatial t 
likelihood function, so as to establish the probability distribution. 

5. The method of claim 4, characterized by the combination of the applicable spatial 
reliability function with the spatial likelihood function consisting of point-by-point 
multiplication and normalization. 

6. The method of claim 3, characterized in that the spatial reliability function is adjusted 
for decrease of reliability over distance. 

7. The method of claim 1 , characterized by the further step of ranking the sub-set of the 
plurality of images according to relevance and displaying such ranking to the user. 

8. A method of searching scenes in images, retrieving images, and/or localizing objects 
in images, characterized by: 

(a) displaying one or more images obtained from one or more cameras calibrated 
to provide location data and data regarding the geometry of an environment 
of a scene; 

(b) a user selecting an object or area of interest in the scene; 

(c) accessing a plurality of images associated with the object or area; 

(d) estimating the three-dimensional location of the object or area, and estimating 
the two-dimensional location of the object or area, so as to define location 
data for the object or area; 
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(e) determining whether the object or area is present in the plurality of images, so 
as to establish a first sub-set of the plurality of images consisting of images 
that include the object or area; 

(f) determining one or more second sub-sets of the plurality of images, 
optionally including images from the first sub-set of images, heratively 
established to be probably of interest for viewing the object or area by: 

(i) determining a probability distribution of the plurality of images based 
on the location data and data regarding the geometry of the 
environment of the scene established for the area of interest or object 
of interest; 

(ii) refining the probability distribution by obtaining user input regarding: 

(A) one of more of the images of the second sub-set of images that 
the user consider^ to be raofct relevant from the current second 
sub-set of thfc plurality of images; and 

(B) selectioti of the 'particular area of interest or object of interest 
in the one or more most relevant images; and 

(iii) updating the current second sub-set of the plurality of images based on 
the user input of (ii). 

9. A system for searching scenes in images, retrieving images, and localizing objects in 
images, characterized in that the system includes: 

(a) at least one camera calibrated to provide location data and data regarding the 
geometry of an environment of a scene, for a plurality of images, the plurality 
of images including at least two images including a view of a single area or 
object shown in a scene; and 

(b) a computer linked to the at least one camera, the computer including a 
computation utility, the computation utility being operable on the computer 
to: 
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i (i) select a particular area of interest or object of interest in the scene; and 

(ii) iteratively establish a sub-set of the plurality of images that are 
probably of interest for viewing the particular area of interest or object 
of interest by: 

(A) determining a. 'probability distribution of the plurality of 
images based on Ideation data and data regarding the geometry 
of the environment of the scene established for the area of 
interest or object of interest; 

(B) refilling the probability distribution by obtaining user input 
regarding: 

(I) one or more of the sub-set of plurality images that the 
user considers to be most relevant from the current sub- 
set of the plurality of images; and 

(D) selection of thd particular area of interest or object of 
interest' in the one or more most Relevant images; and 

(C) updating the sub-set of the plurality of images based on the 
user input of (B). 

10. A computer program for searching scenes in images, retrieving . images, and 
localizing objects in images, characterized in that the computer program includes 
instructions operable on a computer to: 

(a) enable an interface with at least one camera calibrated to provide location 
data and data regarding the geometry of an environment of a scene, the 
plurality of images including at least two images including a view of a single 
area or object shown in a scene; and 

(b) provide a computation utility operable to enable a user to: 

(i) select a particular area of interest or object of interest in the scene; and 
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(ii) iteratively establish a sub-set of the plurality of images that are 
probably of interest for viewing the particular area of interest or object 
of interest by: 

(A) determining a probability distribution of the plurality of 
images based on location data and data regarding the geometry 
of an environment of the scene established for the area of 
interest or object of interest; 

(B) refining the probability distribution fay obtaining user input 
regarding: 

(I) one or more of the sub-set of plurality images that the 
user considers to be most relevant from the current sub- 
set of the plurality of images; 

CO) selection of the particular area of interest or object of 
interest in the one or more most relevant images; and 

(C) updating the sub-set of the plurality of images based on the 
user input of (B); . 
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